Sentiment Analysis and the Crowd Economy

31-14/SEMTM0012
lecture-note

Sentiment describes an attitude or opinion towards something. We can assign a value from -1 to +1 to quantify it - called the sentiment polarity.

UK Competition and Markets Authority estimated in 2015 that £22.31 billion a year of consumer spending is influenced by online reviews. More than half of this is travel & hotels.

Michael Luca discovered that a one-star increase in Yelp rating leads to a 5-9% increase in revenue. This was shown to mostly affect independent restaurants, with barely any effect on chain restaurants. Chain restaurants have declined in market share with the penetration of Yelp, suggesting online reviews are substituting for traditional reputation.

Traditional methods of gathering opinions (focus groups, surveys, polls) are expensive. It is cheaper to make use of data available online such as reviews, posts, blogs. This can be done much faster than traditional methods, so can be used to gauge the impact of decisions in semi-real-time.

Levels of analysis

Document Level

For a given document, quantify the overall attitude on the discussed topic.

Sentence/Phrase Level

Quantify the attitude of a specific sentence/phrase in a document.

Aspect/Feature Level

For a given document, quantify the sentiment on each aspect of the thing discussed in the document. E.g. keyboard, screen, battery, etc in a laptop review.

Unsupervised Learning for Document-Level Analysis

For each adjective or adverb in a document, we can assign it a semantic orientation to the words “excellent” and “poor” based on how often it shows up with those words in online search results. This means we don’t need to perform any labelling ourselves.

The problem with doing this on a word-level is that it misses the larger picture, especially with words like “not” which invert the meaning of the following word, but that isn’t taken into account.

Aspect-Level Analysis

For a given document we want to identify every quintuple expressed:

  • The object
  • The aspect
  • The sentiment
  • Who holds the sentiment
  • When it was expressed We do this by:
  • Extract aspects
  • Group aspects based on synonyms
  • Classify sentiment
  • Extract entity, opinion holder, time A simple algorithm:
  • Mark sentiment words as -1 or +1 using a lexicon
  • Identify sentiment shifters (not, never, etc)
  • Identify ‘but’ phrases - if the sentiment on one side cannot be identified, it is assumed to be opposite to the sentiment on the other side
  • Sum sentiment scores, weighted by distance from aspect word

Data Mining

Companies such as RavenPack, Topsy, and Dataminr provide sentiment analysis mining as a service.

Sockpuppetry

A sockpuppet refers to an account that poses as an independent third party unrelated to its operator.

Examples

  • Ballot stuffing (voting with multiple identities)
  • Sybil attack (attacking a p2p network with fake identities)
  • Stealth marketing
  • Strawman (pretending to disagree with an opinion you have in an easily refuted manner)
  • Astroturfing (making a message seem grassroots)

Crowd Economy

The crowd economy refers to businesses and other economic entities which obtain most of their value through the interconnection of a large number of people. E.g. kickstarter, wikipedia, cryptocurrencies, youtube, quora.

Q: How much consumer spending in the UK was influenced by online reviews in 2015? A: £22.31 billion, more than half was travel & hotels

Q: What revenue increase can a one-star increase in Yelp rating lead to? A: 5-9%, discovered by Michael Luca, mostly affects independent restaurants.

Q: Why data mine instead of traditional opinion gathering methods? A: Cheaper, can be done much faster (even in realtime)

Q: What are the three levels of sentiment analysis? A: - Document level

  • Sentence/Phrase level
  • Aspect/Feature level

Q: How do we do unsupervised learning for document-level sentiment analysis? A: We assign each adjective and adverb a semantic orientation based on how often it shows up with “excellent” and “poor” in online search results.

Q: What is the problem with unsupervised learning for document-level sentiment analysis? A: Misses the larger picture, especially due to words like “not” which invert the meaning of the following word.

Q: List 3 companies that provide sentiment analysis mining as a service A: RavenPack, Topsy, Dataminr

Q: What is sockpuppetry? A: A social media account which poses as an independent third party unrelated to its real operator.

Q: What are examples of sockpuppetry uses? A: - Ballot stuffing (voting with multiple identities)

  • Sybil attack (attacking a p2p network)
  • Stealth marketing
  • Strawman (pretending to disagree with your opinion in a manner you can easily refute)
  • Astroturfing (making a message seem grassroots)

Q: List 5 examples of the crowd economy A: Kickstarter, Wikipedia, Cryptocurrencies, Youtube, Quora